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1.   INTRODUCTION 

Several  algorithms  have  been  developed  for  the  direct  solution 
of  finite  difference  approximations  of  the  Poisson  and  biharmonic  equa- 
tions on  a  rectangle  [l-3],  [k,5].      For  square  meshes  of  n  x  n  points,  the 
fastest  available  sequential  algorithms  require  0(n2log2n)  time  steps  for 
the  Poisson  equation,  and  0(n3log2n)  time  steps  for  the  biharmonic  equa- 
tion.  Using  as  many  parallel  processors  as  necessary,  one  could  hope  to 
achieve  a  solution  to  either  of  these  problems  in  0(log2n)  time  steps.   In 
fact,  it  follows  from  a  simple  fan  in  argument  that  any  algorithm  which 
requires  the  interaction  of  n2  data  points  with  each  other  cannot  be  com- 
puted in  less  than  21og  n  time  steps. 

We  are  concerned  here  with  parallel  computation  and  shall  mea- 
sure time  in  unit  steps.   If  p  processors  are  being  used,  then  we  denote 
the  computation  time  as  Tp  unit  steps.   Thus  T±   is  the  time  required  by  a 
serial  machine.   We  define  the  speedup  of  a  computation  using  p  processors 
over  the  serial  computation  time  as  S  =  T  /T  >  1.   Further,  we  define 

the  efficiency  of  a  p  processor  computation  as  E  =  S  /p  <  1.   We  also 

y        p 
assume  that: 

(i)   any  number  of  processors  may  be  used  at  any  time,  but  we 
will  give  bounds  on  this  number, 

(ii)   each  processor  may  perform  any  of  the  four  arithmetic 
operations  in  one  time  step,  and 

(iii)   there  are  no  memory  or  data  alignment  time  penalties. 
In  this  paper  we  develop  a  parallel  algorithm  for  the  Poisson 
equation  which  uses  only  0(log2n)  time  steps  and  0(n2)  processors.  Thus 


2 
we  achieve  a  speedup  of  0(n  )  over  the  fastest  known  sequential  algorithms, 

at  an  efficiency  which  is  independent  of  n.   Furthermore,  we  present  a 

parallel  algorithm  for  the  biharmonic  equation  which  uses  only  0(n)  time 

3  2 

steps  and  0(n  )  processors  for  a  speedup  of  0(n  log  n)  over  the  fastest 


known  sequential  algorithm.   We  also  present  a  slower,  more  efficient 

52 


algorithm  which  uses  0(nlog  n)  time  steps  and  only  0(n  )  processors,  for 


2 

a  speedup  of  0(n  ). 


2.   PRELIMINARY  RESULTS 


Now  we  introduce  several  lemmas  which  will  be  fundamental  in 
obtaining  the  main  results  of  this  paper.   Throughout  section  2,  we  assume 
n  =  2   for  a  positive  integer  k. 


Lemma  1 

The  coefficients  y. ,  l<i<n,  of  the  Fourier  transform  of  the  vec- 
tor of  complex  data  points  x. ,  l<i<n,  can  be  computed  in  T  =31og0n  steps 
using  p=2n  processors  with  a  speedup  S  =5n/3  and  an  efficiency  E  =5/6. 

The  details  of  the  Fast  Fourier  Transform  have  been  discussed 
by  a  number  of  authors,  e.g.  [6-8].   Hence,  we  only  sketch  a  proof  that 
31og  n  steps  are  sufficient.   Each  of  the  log  n  stages  in  the  FFT  algo- 
rithm requires  the  multiplication  of  n/2  pairs  of  complex  numbers,  followed 
by  the  addition  of  n  pairs  of  complex  numbers.   Thus,  2n  processors  are 
required  to  form  the  Mn/2)  products  in  one  step,  and  n  processors  can  sum 
the  n/2  real  and  n/2  imaginary  parts  simultaneously  in  another  step.   The 
complex  addition  of  n  pairs  of  numbers  will  require  2n  processors  for  the 
third  step.   Since  the  serial  time  required  for  this  process  is  T  =  5n  log_n 
steps,  we  achieve  a  speedup  of  S  =  T  /T  =  5n/3,  with  an  efficiency  of 
Ep  =  Sp/p  =  5/6. 


Corollary  1 

The  coefficients  y. ,  l<i<n,  of  the  Fourier  transform  of  the 

vector  of  real  data  points  x. ,  l£i<n,  can  be  computed  in  T  =3  log  (n/2) 

steps  using  p  =  n  processors  with  a  speedup  S  =  5n/6(l  - ),  and  an 

P  l°g0n 


efficiency  E  =  5/6(1  - ). 

P  log2n 

This  follows  from  Lemma  1  and  the  fact  that  n  real  points  may 
be  regarded  as  n/2  complex  points  in  the  FFT  algorithm,  [9]. 


Lemma  2 


Any  triangular  system  Ax  =  f  of  order  n,  where  A  is  either 


2 
unit  lower  or  upper  triangular,  can  be  solved  in  T  <  (log  n  +  3  log  n)/2 

steps  using  p  <_  n  /6h   processors,  with  a  speedup  S  =  0(n^/log  n)  and  an 

2 
efficiency  E  =  0(l/n  log  n). 

This  is  Theorem  1  in  [10],  hence  the  proof  will  be  omitted  here 


Lemma  3 

Let  A  in  Lemma  2  be  a  bidiagonal  matrix  (a. .=0  for  li-i  >2). 
Then  the  system  Ax  =  f  may  be  solved  in  T  =2  log  n  steps  using  p  =  (n-l) 
processors  with  a  speedup  S  =  (n-l ) /log  n  and  an  efficiency  E  =  l/log_n. 

This  is  a  corollary  of  Theorem  2  in  [l0]. 

Lemma  k 

Let  A  be  a  nonsingular  symmetric  matrix  of  order  n.   Then  it  can 

be  factored  in  the  form  A  =  LDL11  in  T  =  3 (n-l)  steps  using  p  =  n(n-l)/2 

P 

processors,  where  L  is  unit  lower  triangular  and  D  is  a  diagonal  matrix  with 
nonzero  elements.   Thus,  we  obtain  a  speedup  S  =  n(2n+5)/18  with  an 

2 
efficiency  -5-  <  E  <  1  . 


Proof 


Let  L  =  Ur*2,...,*n],  where  I*  =  (0.. ..  ,0,1,  Vl.k'W.k'"  ••ln,k,« 

n       t 
hence  A  =  A  -   £   d.£.£..   At  the  beginning  of  the  k-th  stage  we  have 

(d  ,d_,...,cL   ),  [Z-  ,Jlp, . . .  ,  JL  _  ) ,  and  the  symmetric  matrix  A^  =  A  -  E  d. .  Jl .  Jl . 

i=l 

(k)  (k) 

whose  first  (k-l)  rows  and  columns  are  zero.   Thus  cL  =  a,   ,  and  I       =  a   /cL 

(r  =  k+l,...,n)  are  obtained  by  one  division  using  (n-k)  processors.   We 

then  obtain  the  matrix  cL  JL  JL  in  one  multiplication  using  (n-k)  (n-k+l)/2 

processors,  and  A    =  A  -  till     in  one  subtraction  using  the  same  number 

of  processors.   Therefore,  3  time-steps  are  required  in  the  k-th  stage  using 

(n-k)  (n-k+l)/2  processors.   Hence,  the  whole  factorization  requires  T  =  3(n-l) 

P 

steps  with  p  =  n(n-l)/2  processors. 

The  sequential  algorithm  requires  n(n-l) (2n+5)/6  steps,  thus  we 
obtain  the  above-mentioned  speedup  and  efficiency. 


3.   A  POISSON  SOLVER 

We  consider  the  five-point   finite  difference  approximation  of 
the  Poisson  equation 

-V2!!  =  f  (1) 

on  a  unit  square.   We  superimpose  a  square  grid  over  the  unit  square  with 
a  mesh  size  h  =  l/n  and  assume  that  log  n  =  2  for  a  positive  integer  k. 
In  fact,  the  results  of  this  paper  hold  for  n  =  2  ,  but  for  later  notational 
simplicity  we  make  the  above  restriction.   Considering  the  periodic  boundary 
conditions: 


u(xQ,  y)  =  u(xn,  y) 
u(xs  yQ)  =  u(x,  yn), 


(2) 


we  are  led  to  the  linear  system, 


where 


and 


M  v  =  w   , 
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-I     A 
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t   t 
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/  t   t      t> 
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. .  .  ,  w 
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f    2  2^ 

(n   x  n  ), 


(3) 


A  = 


k  -1 


-1 
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1   h         -l 
\ 

s-i  "N  u 


S       V       V 


N         S  V 


-1 


-1 


(n  x  n. 


The  matrix  A  has   the   eigenvalues   A     =   k   +  2cos— "—  ,    (j   =  0,    1,   2,    ...,   n-1 ) , 

and  the    corresponding  eigenvectors   are 
1 


t 

qo 


^ 


(1,    -1,    1,    ...,    -1), 


q£  -  -~  (1,   1,   1,    ..-,   1), 
2        ^ 


t  /2    ,  2£tt  kin  on    , 

q„    =     /—  (cos  ,    cos  ,    ....    cos   2£tt)_ 

H       V  n  n  n  ' 


and 


f2    ,    .      2£tt  .      kH-n 

—  (sin  ,    sin 


,   sin  2&ir)* 


where   £  =  1,2,    ...,—-   1.      Let  Q  be  the   eigenvector  matrix  of  A,   v.    =  Q  v.  , 
and  w.    =   Q  w.    (i   =   1,    2,    . . . ,   n).      Thus,   the   solution  of  system   (3)    can  be 
reduced  [3]  to  that  of  solving  the  n  systems  of  linear  equations. 


r  .v,  =  v. 


(j   =  0,   1,   2,    ...,  n-1), 


(k) 


where 


and 


r.  = 
J 


A .        -1  -1 


-1        A  .        -1 

\\  \ 

-1        A  .        -1 


-1 


-1        A. 

J  -J 


vr  (v  vj2'  •■••  V' 


(nxn), 


in  which  v   and  w   are  the  j-th  components  of  v  and  w  ,  respectively, 
algorithm  proceeds  as  follows  in  three  stages: 


The 


i)  Compute  w.  =  Q  w. 


(i  =  1,  2,  ...  ,  n), 


ii)   Solve  the  n  systems 


r.v,  =  w 

J  J     J 

iii)   Compute  v.  =  Qv. 


(j  =  0,  1,  2,  ...,  n-1), 
(i  =  1,  2,  ...  ,  n). 


With  this  theoretical  "background  for  the  Poisson  equation,  we 
now  turn  to  the  development  of  an  algorithm  which  computes  a  solution  in 
0(log9n)  time  steps.   This  will  be  developed  by  analyzing  p  and  T  for 
stages  i  -  iii  above.   The  result  is  presented  below  as  Theorem  1. 

We  describe  an  algorithm  that  capitalizes  on  Lemmas  i  and  2  so 

as  to  require  a  time  within  a  small  multiple  of  the  time  required  by  a  fan  in 

2 
argument  on  n  points.   We  give  the  number  of  time  steps  and  processors  re- 
quired in  each  stage  for  the  major  operations  whose  computation  time  depends 
on  n.   In  other  words  we  will  ignore  operations  which  can  be  computed  in 
constant  time.   Figure  1,  below,  gives  a  time-processor  chart  for  the 
parallel  Poisson  solver. 
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3  log2n 


2  log2n 


jL 


l°g2n 
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2  log2n 


f 


3  log2n 


-jn/log  n 




w.    =   Q  w. 
l                l 

zj 

a. 
J 

v.    =  Qv. 

1 

stage  i,   i=l,  2,  ...,n 


stage  ii,   j  =  0,  1,  ...,  n-1 


stage  iii,  i  =  1,  2,  ...,  n 


Figure  1 


From  the  eigenvectors  q  and  q    ,  of  the  matrix  A  we  see  that 

v.  =  Q  w.  will  give  rise  to  discrete  cosine  and  sine  transforms  each  of  n/2 
1      1 

real  data  points.   By  Corollary  1,  each  of  these  can  be  computed  by  an  FFT 

algorithm  of  n/U  complex  data  points;  thus  we  can  evaluate 

(w   , ...,  w   lo      ,  w    ,9xl)...,  w     )  in  3  log  (n/U )  steps  using  n 
i,l       i,n/£!-l   i,n/^+±       i,n-l  2 

processors.   Using  (n/log  n)  additional  processors,  we  can  simultaneously 


evaluate  w 


1 


n-1 


,   n-1 

,  =  I      (-1)J  w.  .  and  w 

•n  j=0         'd      i,-   /n  j=0 


Z  w.  .  in 


(3  logpn  -  logp  logpn)  steps.   Therefore,  stages  i  and  iii  require 

2    2 
3  log_n  steps  each  with  (n  +  n  /log_n)  processors  as  shown  in  Fig.  1. 

In  stage  ii  we  solve  the  linear  systems  (h)   using  Toeplitz 

factorization  in  conjunction  with  the  Woodburry  formula,  [ll].   The  matrix  T 

can  be  considered  as  the  sum  (B.  +  SV.),  where 

J  J 


j 


-1  X .        -\ 

B.    =         \  J  \  ! 

\\  \l 

-1        A.        -1 


-1        X.  (nxn), 

oJ 


S  =    [e    ,e    ],    and 


lu.-i    o- 

t         i    J 

V.    =    j 

J        -1        o- 


0   -1   i 

0      0   !        (2xn). 


The  matrix  B     can  then  be   factored  into  the  product  LU  ,  where 
j  J   J 


^j-1 


u.  =! 


-l 


1      ,    and  L. 


W  ■ 


-1 

N 
S 

Ns 

I 

y 


J      ! 


2  1  /2 

in  which  y     =    (X   /2)    +    ((X    A)      -  l)         ,    and  for  numerical  stability  of 

the   factorization  the   sign  is    chosen  such  that    |y.|    =    |x    /2 1    +    ((X./1+)   -l ) 

J  J  J 

Thus, 

v.  =  rT    w. 


1/2 


=  uT^lT1*    -  l^s  (i    +  v*u:1L:1sr:Lv*u,:1L"1i,]. 

J         J      J  J  2  J    j      J  J    J      J      j 


Solving  the   systems   L.G.   =  S   and  U.H.   =  V.    is   rather  straightforward; 

J    J  J    J  J 


and 


Hj   =  [(PJ%   -yjlen»'   "Ilgj]' 


where 


t        ,_        -1        -2  -n+h 

g     =    (1,    y       ,    y      ,    ...  ,    y  ) 

J  J  J  J 


We  assume  that   such   a  vector  may  "be  computed  in  constant   time.      The  major 

operations   in  computing  the  v.'s   are  those   involved  in   solving  the   systems 

J 


L.z.    =  w.,    and  U.v.    =  a.  , 
J   J  J  J    J  J 


(5) 


in  which 


a.    =  z.    -  G.(I„   +  H*0.)   1H^z. 
J  J  J2  jj  jj 


(6) 


By  Lemma  3  we  see  that  solving  each  of  the  systems  in  (5)  requires  2  logpn 
steps  and  (n-l)  processors.   Since  g.g.  =  (l-y.   )/(l-y.  ),  then  the  only 

J   J  J  J 

major  operation  in  (6)  is  that  of  computing  the  inner  product  g.z..   This 

J   J 

requires  log  n  steps  and  n  processors.   Stage  ii,  therefore,  requires  a 

2 
total  of  5  logpn  steps  with  n  processors  as  shown  in  Fig.  1. 


10 


Thus  the  total  number  of  time  steps  required  for  this  algorithm 

2    2 
is  11  log  n  with  n  +  n  /logpn  processors.   Similarly  we  can  show  that  for 

Neumann  boundary  conditions  the  same  numbers  of  steps  and  processors  are 

required,  while  for  the  Dirichlet  boundary  conditions  we  require  only 

2 
10  log  n  steps  with  n  processors,  since  stage  ii  needs  only  k   log  n  steps. 

The  following  theorem  is  therefore  established, 


Theorem  1 

The  five-point  finite  difference  approximation  of  the  Poisson 
equation  on  the  unit  square  with  mesh  size  h  =  l/n  can  be  solved  in  at  most 

T  =  11  log„n  steps  using  no  more  than  p  =  n(n+[n/log0nl )  processors,  with 

P         2  « 

a  speedup  S  =  0(n  )  and  an  efficiency  9/44  <_  E  <   9/20.  O 

Comparing  our  parallel  algorithm  with  the  odd-even  reduction  scheme 
[3] ,  we  obtain, 

S  =  |  n2log2n/ll  log2n  =  9n2/22,   9/44  <  E  £  9/20 
where  the  upper  bound  is  achieved  for  the  Dirichlet  boundary  conditions. 


We  use  [xl  to  denote  the  smallest  integer  greater  than  or  equal  to  x. 


11 


k.      A  BIHARMONIC  SOLVER 

We  consider  the  "biharmonic  equation, 

k 
V  u  =  f 

on  a  unit  square  with  the  boundary  conditions  u(x,  y)  =0  and  u  (x,  y)  =  0 

for  (x,  y)  e  9R,  where  u  (x,  y)  is  the  outward  normal  derivative  at  the 

boundary.   Superimposing  a  square  mesh  on  the  unit  square  with  mesh  size 

h  =  l/(n+l),  where  n  =  2  ,  then  a  finite  difference  approximation  leads  to 

the  linear  system 


M  v  =  w, 


(T) 


where 


"b 


A+I   2B   I 

2B    A    2B   I 

I     2B   A    2B   I 


"1    2B   A    2B   I 
I    2B   A    2B 
I    2B   A+I 


f    2  2\ 
(n  xn  )3 


in  which 


B  = 


-k        1 
1  -U   1 


\   s 
\   \ 
\   \ 


1-Ul 
1  -h 
t 


A  =  B  +  21  +  2EE 


(nxn), 


=  V  +  2EE  ,  and 


■*- 


1   o 0   0 

o  o 0  1 


(2xn) 
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The   eigenvalues   of  B  are   given  by 

in 


w.    =  -h   +  2   cos  - 
l  n+1 


(i   =  1,   2,    ...,   n), 


and  the   eigenvector  matrix, 


<» -[«„!■ 


n+1 


sin 


1  JTT 

n+1 


System  (?)  can  be  expressed  [5]  as 


where 


(G  +  2FF  )v  =  w, 


(i,  j  =  1,  2,  . . .  ,  n). 


G  = 


V+I   2B   I 

2B    V    2B   I 

I     2B   V    2B 


I    2B   V    2B   I 


\    \   \ 


I    2B   V    2B 
I    2B   V+I 


(n  xn  )  , 


and  F/  2    \  =  diag(E,  E,  ...,  E).   Thus  by  the  Woodburry  formula, 
\Zi   xc.n ) 


v  =  G_1w  -  2G~1F(I^   +  2FtG  XF)  1FtG_1w 

2n 


(8) 


and  the  algorithm  proceeds  as  follows  in  three  stages: 
i)  Solve  the  (2n+l)  linear  systems 
GY  =  F  and  Gv  =  w, 
ii)  Factor  the  symmetric  matrix  {T     +  2F  Y)  in  the  form  LDL 

and  solve  the  linear  systems 

t^  t 

Lz„  =  F  v,  Dz  =  Zp,  and  L  z  =  z  , 

iii)   Obtain  the   solution, 


>\> 


v  =  v  -   2Yz. 
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In  stage  i,  we  would  like  to  solve  (2n+l)  systems  of  the  form 

Gx  =  b  .  (9) 

As  in  the  direct  Poisson  solver,  solving  the  system  (9)  can  be  reduced  to 

that  of  solving  the  n  systems, 

fix  =  b         j  =  1,2,. ..,n  ,  (10) 

J.J        J 


where  the  positive-definite  matrices  fi.  are  given  by 

J 


1+A. 

J 


2u> 


fi.  = 
J 


2u>.    1 
J 


2d) .    1 
J      J 


2w.    A.    2o).    1 

\ 
\     \     \     \ 
\     \     \     \ 
\     \     \     \ 
\     \     \ 


2u 


2w 


2u.    A  . 

J    J 


2u. 


1     2u.    l+X 


J  J 


(nxn)  , 


where 


A.  =  2  +  id, 

j      ; 

c. 

J 


'    (*J1'  5J2» 


in  which, 


,  x.  ) ,  and 


b .  =  (b   ,  b   ,  . . .  ,  b.  )  , 
J     Jl   J2'       jn 


\  "  «*V  \  "  «\ 


k  =  1,  2,  . . .  ,  n 


1/2 


Let  y  =  (u  /2)  -  ((oj  /If)  _  i)  '      ;  then  fi.  can  be  expressed  as 
J     J         j  J 


n.  =  t2  +  s.v,' 

j   j   j  j 


(ii) 


where 


1U 


T.    = 


vi   1 

1       u,     1 
\       ^      N 

\      N     * 

Nl  0)  1 

(J 

1  OJ, 


(n*n), 


in  which 


Sj  =   C(ajel  +   6je2K   V  en]'    md 
Vj   =    Cel'    6je2>    en]' 


2  2 

a.   =  2   +  co.   -   y .    ,    6 .   =  u>.   -   y . 
J  J  J  J  J  J 


T.   has  the   Toeplitz   factorization  L.U.    similar  to  that   of  B     in  Sec.    3, 
j  J   J  J 

where  "both  L     and  U.    are   diagonally   dominant.      Therefore  the  numerical 
J  J 

stability  of  the  process   of  solving  the  linear  systems    (L.U.)  y.   =  f.    is 

assured.      Thus  the   solutions  to  the   systems    (10)  may  he  written  as, 

x     =    (L  U    )"2h      -    (L  U    )~2S.(1„   +  V^(L.U.)"2S.)"V(L.U  J"2"b..      (12) 
j  j   j  j  J    J  j3  J      J   J  J  J      J   J  J 

2 
By  Lemma  3,  solving  the  system  (L  U.)  y.  =  h.  requires  8  log  n  steps 

j  j    j    j  <— 

using  (n-1)  processors.   Furthermore,  since  x  in  Lx  =  e.  (i=l,2,n),  and 

Ux  =  e  can  be  obtained  in  constant  time,  the  solution  of  the  system 
n 

/  \2^  ^  /        N  A 

(L.U.J  S.  =  S.  requires  only  16  log  n  steps  using  (n-1)  processors.   Thus  x. 
J  J   J    J  *-  <J 

in  (12)  is  obtained  in  l6  log  n  steps  with  2(n-l)  processors.   By  virtue  of 

Corollary  1,  the  vectors  b.  and  x.  in  (ll)  are  obtained  in  6  log  (n/2)  steps 

J      J  e- 

using  n  processors.   Therefore,  the  linear  system  (9)  can  be  solved  in 

2 
22  log  n  steps  using  2n  processors.   Consequently,  stage  i  requires  22  log  n 


steps  provided  we  use  2n  (2n+l)  processors. 
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By  Lemma  U,  the  matrix  (I   +  2F  y)  can  be  factored  in  the  form 

LDL  in  3(2n-l)  steps  using  n(2n  -  l)  processors.   By  Lemma  2,  we  require 

2 
(log22n  +  3  log22n)  steps  to  solve  the  three  linear  systems  in  stage  ii  using 

3  2 

n  /8  processors.   Thus,  stage  ii  requires  (6n  +  log  n  +  5  log  n  +  1)  steps  with 


'2 


o 

max  (n(2n  -  l),  n  /8)  processors.  Finally,  stage  iii  requires  log  n  steps  using 

3 
2n  processors. 

Hence  we  have  established  the  following  theorem: 

Theorem  2 

The  finite  difference  approximation  (7)  of  the  biharmonic  equation 

2 
on  the  unit  square  can  be  solved  in  at  most  T  =  6n  +  logQn  +  28  log  n  +  1  steps 

.M  *~ 

2  2 

using  p  =  2n  (2n  +  l)  processors,  with  a  speedup  S  =  0(n  log  n),  and  an 

P         "  ^ 

efficiency  E  =  0(log  n/n).Q 

For  large  n,  T  o6n  and  plk^n  .   Comparing  this  with  the  sequential 

algorithm  in  [5],  we  obtain  a  speedup  of  no  less  than, 

3  5   2 

S  =  5n  log0n/6n  =  —  n  log  n 
p         2      6       2 

with  an  efficiency  of, 

Ep  =  5  log2n/24n  . 

Note  that  for  n^2  ,  Lemma  2  requires  at  most  i+n  processors. 

3 
For  large  n,  the  requirement  of  n  processors  may  be  impractical.   If  we 

wish  to  limit  the  number  of  processors,  the  above  algorithm  can  be  shown 

2 
to  require  0(n  log  n)  steps  with  0(n  )  processors. 

This  can  be  explained  as  follows.   If  we  have,  for  example 

2 
n  processors,  then  let  us  solve  the  top  k  equations  of  the  linear  sys- 

2    ~i  2/3 

terns  in  stage  ii  where  by  Lemma  2  n  =  k  /6U,  i.e.  k  =  kn        .      We  can 

then  compute  the  partial  sums  of  the  first  k  terms  of  the  remaining  (n-k) 

equations  as  a  new  constant  vector  for  the  remaining  triangular  system. 
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This  process  can  be  repeated 
and  it  takes 


2n 


times  until  the  whole  system  is  solved, 


T   "o 
P  ~ 


2n 


[(log2k  +  3  log2k)/2  +  log2k] 


=  0(n1/3log^n). 

For  a  moderate  range  of  n,  T  =  0  (n).   With  0(n  )  processors,  now, 
stage  i  can  also  be  evaluated  within  0(n  log  n)  steps  by  computing  the 
(2n  +  l)  systems  sequentially;  stage  iii  will  still  have  T  =  0(log?n). 
Hence,  by  this  procedure,  we  can  obtain  a  somewhat  slower  but  more  effi- 
cient  algorithm  with  speedup  of  0(n  )  and  an  efficiency  again  indepen- 
dent of  n. 


17 


REFERENCES 

[1]       Dorr,  F.  W.   The  direct  solution  of  the  discrete  Poisson  equation 

on  a  rectangle.   SI AM  Review  12,  2  (1970),  248-263. 
[2]       Buzbee,  B.  L.   A  fast  Poisson  solver  amenable  to  parallel  computa- 
tion.  IEEE  Trans.  Comput.  C-22,  8  (1973),  793-796. 
[3]       Buzbee,  B.  L. ,  Golub,  G.  H. ,  and  Neilson,  C.  W.   On  direct  methods 

for  solving  Poisson' s  equations.   SIAM  J.  Numer.  Anal. 

7,  4  (1970),  627-656. 
[4]       Ehrlich,  L.  W.   Solving  the  biharmonic  equation  in  a  square:   a 

direct  vs.  a  semidirect  method.   Comm .  ACM  6,  11  (1973), 

711-714. 
[5]       Golub,  G.  H.   An  algorithm  for  the  discrete  biharmonic  equation. 

Private  communication. 
[6]       Brigham,  E.  0.,  and  Morrow,  R.  E.   The  fast  Fourier  transform. 

IEEE  Spectrum,  (Dec.  1967),  63-70. 
[7]       Pease,  M.  C.   An  adaptation  of  the  fast  Fourier  transform  for  parallel 

processing.   J.  ACM  15,  2  (April  1968),  252-264. 
[8]       Ackins,  G.  M. ,  and  Kuck,  D.  J.   Seismic  signal  processing  via  the 

ILLIAC  IV  computer.   IEEE  Trans.  Geo.  Electron.  GE-7, 

(Jan.  1969),  34-41. 
[9]       Cooley,  J.  W. ,  Lewis,  P.  A.  W. ,  and  Welch,  P.  D.   The  fast  Fourier 

transform  algorithm:   programming  considerations  in  the 

calculation  of  sine,  cosine  and  Laplace  transforms. 

J.  Sound  Vib.  12,  3  (1970),  315-337. 


18 


[10]      Chen,  S.  C. ,  and  Kuck,  D.  J.   Time  and  parallel  processor  bounds 

for  linear  recurrence  systems.   Submitted  to  IEEE  Trans. 

Comput . 
[11]      Fischer,  D. ,  Golub ,  G. ,  Hald,  0.,  Leiva,  C. ,  and  Widlund,  0.   On 

Fourier-Toeplitz  methods  for  separable  elliptic  problems. 

Math.  Comp.  28,  126  (1974),  349-368. 


BIBLIOGRAPHIC  DATA 
SHEET 

1.    Report  No. 

UIUCDCS-R-74-684 

2. 

3.  Recipient's  Accession  No. 

4.  Title  and  Subtitle 

PARALLEL  DIRECT  POISSON  AND  BIHARMONIC   SOLVERS 

5.   Report  Date 

July  1974 

6. 

7.  Author(s) 

A.   H.    Sameh,    S.    C.    Chen  and  D.    J.    Kuck 

8.    Performing  Organization  Rept. 

No-  UIUCDCS-R-74-684 

9.   Performing  Organization  Name  and  Address 

University  of  Illinois  at  Urbana-Champaign 

10.   Project/Task/Work  Unit  No. 

Department  of  Computer  Science 
Urbana,   Illinois     61801 

11.  Contract/Grant  No. 

US  NSF-GJ-36936 

12.  Sponsoring  Organization  Name  and  Address 

National   Science  Foundation 

13.  Type  of  Report  &  Period 
Covered 

Washington,   D.    C.      20550 

14. 

15.  Supplementary  Notes 

16.  Abstracts 

We  consider   the  direct  solution  of   the  Poisson  and  biharmonic  equations  using  a 
number  of   arithmetic  units   in  parallel.      Assume  an  n*n  grid  of  mesh  points.      We 

show  that  a  finite  difference  approximation  to   the  Poisson  equation  can  be  solved  in 

2 
0(log9n)    time  steps  using  0(n   )   processors   in  parallel.      This  holds   for  Dirichlet, 

Neumann  and  periodic  boundary  conditions.     We  also  show  that  a  finite  difference 

approximation  to  the  biharmonic  equation  can  be  solved  in  0(n)    time  steps  using 

3 
0(n  )   processors  in  parallel. 

17.   Key  Words  and  Documem 

Analysis.     17a.   Descriptors 

Biharmonic  Equation 
Computational  Complexity 
Parallel  Computation 
Poisson  Equation 


Computing  Reviews  Categories 
5.14,  5.17,  5.25. 


17b.   Identifiers/Open-Ended  Terms 


17c.  COSATI  Field/Group 


18.  Availability  Statement 

Unlimited 


19.  Security  Class  (This 
Report) 

TlNn  ASSIFIED 


20.  Security  Class  (This 
Page 

UNCLASSIFIED 


21.   No.  of  Pages 

20 


22.  Price 


ORM    NTIS-35   (  10-70) 


USCOMM-DC    40329-P7  1 


r«-. 
in 


a. 


■ 
HH    H •' 

HHfl  9H 


B! 


UNIVERSITY  OF  ILLINOIS-URBANA 
JlOMILIRno  COO?  no  179  6(4(1974 

Riporl  / 


3  01 


12  0884015 


■HI  '*& 

■ 


■ 


■■  *£d 


■ 

■  V 

■ 

■ 

H 
m        ■■■ 


m 


m 

I 

on 

■ 

■HE  ?>  I 


■H 


i 

■HI 


